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Whole genome sequencing and the relative ease of transcript profiling have facilitated 
the collection and data warehousing of immense quantities of expression data. However, 
a substantial proportion of genes are not yet functionally annotated a problem which is 
particularly acute for transport proteins. In Arabidopsis, for example, only a minor fraction 
of the estimated 700 intracellular transporters have been identified at the molecular genetic 
level. Furthermore it is only within the last couple of years that critical genes such as those 
encoding the final transport step required for the long distance transport of sucrose and 
the first transporter of the core photorespiratory pathway have been identified. Here we 
will describe how transcriptional coordination between genes of known function and non- 
annotated genes allows the identification of putative transporters on the premise that 
such co-expressed genes tend to be functionally related. We will additionally extend this to 
include the expansion of this approach to include phenotypic information from other levels 
of cellular organization such as proteomic and metabolomic data and provide case studies 
wherein this approach has successfully been used to fill knowledge gaps in important 
metabolic pathways and physiological processes. 
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INTRODUCTION 

Over the last 15 years or so, co-expression analysis has emerged as a 
powerful statistical tool which is based on the guilt-by-association 
approach. This approach assumes that if transcript levels of a gene 
of unknown function correspond tightly with those of genes of 
known function then it is highly likely that the gene of unknown 
function plays a role in the same biological process as the known 
gene (Tohge and Fernie, 2012; Saito etal, 2013; Stitt, 2013). 
There are a number of caveats to this approach including the 
influence of the type of expression data used to construct the 
co-expression networks and the statistical methods used to evalu- 
ate them. However, these have been discussed in detail elsewhere 
[see reviews by (Usadel etal., 2009; Bordych etal., 2013; Stitt, 
2013)] and when properly considered this strategy can prove very 
effective. 

The earliest large scale use of this approach was performed 
in yeast in a two step approach. First, similarity scores were 
assigned to each possible gene pair on the comparison of their 
gene expression levels across a wide range of conditions. Secondly 
the resultant distance matrix, comprised of all possible similarity 
scores was organized ("clustered"), in a manner allowing the facile 
identification of genes showing the most similar expression pat- 
terns (Eisen etal., 1998). Given the simplicity of this approach it 
has also been rapidly adopted in microbial (Fribourg etal, 2001; 
Gasch and Eisen, 2002; Mao et al., 2005; Zhang et al, 2005), mam- 
malian (Taniguchi et al, 2002; Voehringer et al., 2000; Altman and 



Raychaudhuri, 2001; Raychaudhuri etal, 2001; Lee etal, 2004; 
Li etal, 2004; Prieto etal., 2008), and plant research (Maleck 
etal, 2000; Schaffer etal., 2001; Goda etal, 2008). For plant 
research several web-based tools including ATTED-II (Obayashi 
etal, 2009; Obayashi etal, 2011), AraNet (Hwang etal, 2011), 
Expression Angler of the Bio-Array resource [BAR; (Toufighi et al., 
2005] , KappaViewer (Sakurai et al, 201 1 ), GeneCAT (Mutwil et al, 
2008), Genevestigator (Zimmermann etal., 2004), and Virtu- 
alPlant (Katari etal., 2010) simplify this task yet further. One of 
the strongest demonstrations of the power of this technique in 
plants comes from its early utility in the identification of further 
genes involved in secondary cell wall and hemicellulose synthesis 
in Arabidopsis (Brown etal., 2005; Persson etal., 2005; Cocuron 
etal, 2007). These studies used the three major cellulose syn- 
thase A (CESA) genes as a bait (i.e., expression profile query), to 
construct networks and thus isolate novel functional genes dis- 
playing similar expression patterns. Subsequent confirmation of 
the biological function of a large number of these genes has been 
achieved and documented in a number of publications (Bring- 
mann et al, 2012; Ruprecht and Persson, 2012; Sanchez-Rodriguez 
etal, 2012). A second area of plant metabolism in which the 
approach has proven highly informative is secondary metabolism. 
This is perhaps not surprising since secondary metabolism is 
often regulated directly at the transcriptional level and known 
to be under the control of a wide range of transcription fac- 
tors including the MYB transcription factors. The initial use 
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of this approach was in constructing a fiavonoid co-expression 
network to identify a fiavonol-3'-Omethyltransferase (Tohge 
etal., 2007) and fiavonol-7-O-rhamnosyltransferse (Yonekura- 
Sakakibara etal., 2007). However, it has subsequently been used 
to find other fiavonoid genes (Yonekura-Sakakibara etal, 2008), 
glucosinolate MYB regulators (Hirai et al, 2007), anthocyanin glu- 
cosyltransferase (Yonekura-Sakakibara etal, 2012), phospholipid 
sugar transferase (Okazaki etal, 2009), and lignin biosynthetic 
genes (Ehlting etal., 2005; Vanholme etal., 2013). These studies 
have thus allowed us to make considerable advances in defin- 
ing genes associated with metabolism per se. That said, even for 
primary metabolism, there remain many essential proteins for 
which the corresponding gene has not yet been identified. This 
is particularly problematic for transport proteins, of which esti- 
mates based on entirely different approaches suggest that a total of 
6500 membrane transporters exist in Arabidopsis (Schwacke et al., 
2003), while 700 intracellular transporters are required merely 
to maintain the primary metabolic network of the same species 
(Mintz-Oron et al., 2012). Recent articles concerning chloroplast-, 
peroxisomal-, vacuolar-, ER-, and plasma membrane-transport 
all indicate a gradual increase in the functional elucidation of 
all types of transport proteins (Liu and Bush, 2006; Palmieri 
etal, 2011; Rieder and Neuhaus, 2011; Weber and Linka, 2011; 
Martinoia et al., 2012; Hoffmann et al, 2013). However, the num- 
ber of identified transporter proteins falls well short of either of 
the predicted numbers given above, i.e., there are a vast num- 
ber of putative transport proteins but the vast majority have 
either only homology-based annotations or no functional char- 
acterization whatsoever. The recent review by Schroeder etal. 
(2013) reiterates the importance of transporters in metabolic engi- 
neering strategies and as such defines the identification of the 
permeome as of clear strategical importance in sustaining crop 
productivity. As one approach toward this goal, in this mini- 
review we detail (i) the potential of co-expression as a stand-alone 
approach for aiding in the definition of metabolite transporters 
and (ii) how other phenotypic data can be integrated with that 
of gene expression in order to enhance chances of successful gene 
annotation. 

IDENTIFICATION OF TRANSPORTERS IN THE MODEL PLANT 

Arabidopsis 

As mentioned above bioinformatics strategies based either on 
features in protein amino acid sequences or on transport steps 
required to allow a functional subcellular metabolism have enable 
us to set an upper limit to the number of metabolite trans- 
porters in the plant cell. Despite this considerable research effort 
is warranted to elucidate the function of these carriers. That said, 
amongst many important breakthroughs in transporter identi- 
fication, critical advances have been made both in the cloning 
of the first transporters of the core photorespiratory pathway 
(Bordych et al., 2013; Pick et al., 2013) and amino acid metabolism 
(Liu and Bush, 2006) as well as the initial characterization of a 
glucosinolate transporter (Gigolashvili etal., 2009; Sawada etal., 
2009), epicatechin conjugates transporter (Marinova etal., 2007), 
and a lignol transporter (Alejandro etal., 2012). In this section 
we will detail the role of co-expression studies in these discoveries 
(Table 1). 



Transporters involved in photorespiration 

The use of co-expression analysis with regard to the identifica- 
tion of photorespiratory transporters has recently been expertly 
reviewed (Bordych etal., 2013) so we will only cover it briefly 
here. In their analysis the gene PLGG1 (Atlg32080) was ranked 
as a highly promising candidate transporter and plggl-1 knock- 
out plants develop chlorotic regions along the leaf lamina when 
grown under ambient air (NC) conditions while the transporter 
was recently characterized as the plastidic glycerate/glycolate 
transporter (Pick etal, 2013). Similarly, the A BOUT DE SOUF- 
FLE (BOU) protein was successfully identified as a transporter 
involved in shuttling intermediates in the photorespiratory C2 
cycle (Lawand etal., 2002; Eisenhut etal, 2013). Bou knockout 
plants were demonstrated to suffer in ambient air, but grow much 
like the wild-type when kept under high CO2 conditions. More- 
over, the glycine level was greatly increased in comparison to that 
of wild-type plants, while mitochondrial glycine degradation is 
strongly reduced in the mutant. Although the specific substrate 
transported via BOU has not been identified, results collated to 
date seem to suggest it is likely to be a glycine decarboxylase 
co-factor (Bordych etal., 2013). A third candidate, the plastidial 
2-oxoglutarate (2-OG)/malate transporter (AtDiTl) was found 
in a co-expression analysis approach and its sequence homology 
with DiT2.1 (AtpDCTl; Taniguchi et al, 2002; Renne et al, 2003). 
The function of this gene was subsequently confirmed by pheno- 
typic analysis of the gene knockout mutant (ditl mutant) which 
was shown to suffer under normal growth conditions, and dis- 
played retarded development, small leaf size, frequently emerging 
shoots, and a decrease in chlorophyll content (Kinoshita etal., 
2011). AtDiTl provides the chloroplast with the 2-OG utilized by 
Ferredoxin-dependent glutamate synthase (FD-GOGAT) in the 
chloroplast and constructs a double-transporter system together 
with the AtpDCTl protein. Thus this protein participates, albeit 
one step removed, in the export of synthesized glutamate and re- 
fixation of ammonium ions as the result of the photorespiratory 
cycle (Schneidereit etal., 2006; Kinoshita etal., 2010). Despite the 
success of these three examples the function of the other genes 
highlighted in this photorespiratory co-expression study remain 
to be confirmed. 

Bile acid transporter family 

The plastidic bile acid transporter 5 (BAT5) was associated with 
glucosinolate metabolism on the basis of its co-expression with 
known genes of glucosinolate metabolism (Gigolashvili etal., 
2009; Sawada et al., 2009). This was importantly confirmed by the 
fact that loss of function and reduced expression of BAT5 resulted 
in considerably decreased glucosinolate levels (Gigolashvili et al., 
2009; Sawada etal., 2009). However, sodium-coupled transport 
activity of recombinant BAT5 has yet to be demonstrated. Recently, 
glucosinolate transport to seeds was characterized as being carried 
out by At3g47960, a member of the nitrate/peptide (NTR/PTR) 
transporter family, in an approach independent of co-expression 
analysis (Nour-Eldin and Halkier, 2013). Returning to the BAT, 
using a targeted variation on the co-expression theme, another 
member of this family - namely BAT1 - was putatively identified 
(and subsequently confirmed), as a plastidial sodium-dependent 
pyruvate transporter (Furumoto etal, 2011). In this study the 
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Table 1 I A table summarizing the transporter genes presented in this review. 



Gene name 



AG I 



Network 



Reference 



PLGG1 Plastidal glycolate glycerate translocator 1 At1g32080 Photorespiratory metabolism Pick etal. (2013) 

BOU A BOUT DE SOUFFLE At5g46800 Photorespiratory metabolism Eisenhut etal. (2013), Lawand etal. (2002) 

AtDiTI Plastidial 2-oxoglutarate/malate transporter At5g12860 Photorespiratory metabolism Taniguchi etal. (2002), Renne etal. (2003) 

BAT5 2-keto acids transporter At4g12030 Glucosinolate biosythesis Gigolashvili etal. (2009), Sawada etal. (2009) 

NTR/PTR Glucosinolate transporter At3g47960 Glucosinolate biosythesis Nour-Eldin and Halkier (2013) 



AtABC29/PDR1 Monolignol transporter 



At3g16340 Lignin biosynthesis 



Alejandro etal. (2012) 



authors used comparative transcriptome analyses between a C3 
plant species, Flaveria pringlei, and the closely related C4 plant 
species F. trinervia and F. bidentis to identify three novel C4 
species abundant genes predicted to encode chloroplast mem- 
brane proteins. Unlike C3 plant species, which only contain a 
sodium-dependent pyruvate transporter, both sodium-dependent 
and sodium-independent pyruvate transport have been reported 
in a range of C4 species (Aoki etal, 1992). Given this fact Furu- 
moto etal. (2011) used their cross-species analyses to search for 
the gene encoding the sodium-dependent pyruvate transporter 
using the following criteria; (i) given its essential role in C4 pho- 
tosynthesis it should be expressed at considerably higher levels in 
C4 than C3 plants and (ii) that its expression should be low in 
plants of the proton-dependent C4 plant species but equivalent 
in plants displaying sodium-dependent pyruvate transport. Wider 
comparative transcriptomics allowed the exclusion of one of the 
three candidate genes. Crucially functional analysis revealed BAT1 , 
on the basis of its efficient import of pyruvate and physiological 
characterization of Arabidopsis mutant, to be the plastid sodium- 
dependent pyruvate carrier (Furumoto etal., 2011). In this study 
the authors were further able to pinpoint BAT1 as functioning in 
C4 and in the methyl erythritol phosphate pathway in C3 plants. 
The search for the mitochondrial pyruvate transporter is, however, 
ongoing. 

Lignin transporters 

In order to identify genes involved in monolignol transport, 
Alejandro etal. (2012) performed a co-expression network anal- 
ysis with the ABCG transporter subfamily (previously called 
WBCs and PDRs) of Arabidopsis using the ATTED-II database 
(http://atted.jp/). Given that members of the ABCG subfamily 
have been shown to transport a broad range of fatty acids and 
terpenoids they wondered whether this class could also be impli- 
cated in the transport of phenolic compounds. The results revealed 
that AtABCG29/PDR 1 , a member of the full-size ABCG subfamily, 
exhibited a high co-expression ratio with three genes of the phenyl - 
propanoid biosynthesis pathway, which is involved in the synthesis 
of lignin and flavonoids. The well-correlated genes correspond to 
two 4-coumarate coenzyme A (CoA) ligases (4CL2 and 4CL5), 
which convert hydroxycinnamic acids into hydroxycinnamoyl 
CoA esters, and one caffeoyl CoA-O-methyltransferase catalyz- 
ing the conversion of caffeoyl CoA into feruloyl CoA. Moreover, 
seven further genes related to phenylpropanoid biosynthesis are 
co-expressed with AtABCG29, albeit with lower co-expression 



ratios. In concordance with these results Ehlting etal. (2005) 
also reported that AtABCG29 showed an expression pattern in 
primary stems consistent with that of monolignol biosynthetic 
genes and increased lignin content. Subsequent characterization 
of AtABCG29 revealed that yeasts expressing this transporter 
exhibited increased tolerance to p-coumaryl by means of excret- 
ing this monolignol whilst AtABCG29 deficient mutants revealed 
that they contained less lignin as well as modifications to sec- 
ondary metabolites underlining the importance of p-coumaryl 
alcohol levels in the cytosol (Alejandro etal., 2012). Similarly, a 
targeted co-expression analyses looking for transporters which 
were specifically highly expressed in the phloem was used along- 
side metabolome analyses to uncover that ABCG9, ABCG11, and 
ABCG14 are involved in lipid/sterol homeostasis regulation (Le 
Hir etal, 2013). 

As stated above a vast number of transport proteins remain 
uncharacterized and physiologically important transporters such 
as the mitochondrial pyruvate and folate transporters as well as 
practically all amino acid transporters remain to be molecularly 
characterized. The examples presented here suggest that the co- 
expression approach will have utility in identifying genes encoding 
transporters for specific metabolites. 

Use of co-expression analysis in pinpointing process-related 
transporters 

With the exception of the identification of the photorespiratory 
transporters described above most uses of co-expression we have 
described thus far have related to the identification of (metabo- 
lite) specific transporters, however, the utility of the approach 
goes far beyond this application. The photorespiratory trans- 
porters are the best example to date of taking a broader approach, 
however, several further studies have followed this route albeit 
not to such a conclusive end. Three examples of this come from 
our own work wherein we looked at (i) genes co-expressed on 
dark induced senescence (Araiijo etal., 2010; Araiijo etal., 2011), 
(ii) genes co-expressed following exposure to high levels of sev- 
eral light species including UV-B irradiance (Tohge etal., 2011), 
and (iii) genes co-expressed with barley tonoplast proteins (Tohge 
et al., 201 1). The first two approaches identified nine transporters 
and nine transport related proteins as putative membrane trans- 
porters involved in senescence and the UV-B responsive phenolic 
secondary metabolism, respectively. The former study exhibited 
considerable overlap in targets to co-expression and cis associated 
regulatory element analysis of mitochondrially associated proteins 
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following imposition of a broad range of mitochondrial stresses 
(Holt et al, 2006), providing further support for the correctness of 
the putative functional assignment which we suggested. The latter 
study was, however, slightly more complicated in that it formed 
clusters on the basis of already identified tonoplast proteins but 
gave suggested functions including transport of phenylpropanoids 
(Multidrug resistant type transporter and H+ dependent trans- 
porter) and mugineic acid (ABC transporter and transport related 
protein which is in the gene family of glutathione S- transferase). 
It is important to note, however, that these candidate genes are yet 
to be validated by functional analysis. 

CONCLUSION AND OUTLOOK 

Recent years have seen impressive advances in our understanding 
of transport protein function, however, many gaps remain (Weber 
and Linka, 201 1; Rolland et al, 2012; Sweetlove and Fernie, 2013). 
While the co-expression approach has been used effectively for 
transport function predictions being, at least partially, responsible 
for many, of the discoveries reviewed in this article it probably 
remains an underexploited tool. The layering of datasets beyond 
those at the transcriptional level (Tohge and Fernie, 2012), along- 
side more sophisticated cross-species comparisons such as that 
illustrated in the Furomoto study will ultimately likely be more 
tractable in asking specific pathway or process based questions. 
That said the recent characterization of the plant ammonium 
transceptor De Michele etal. (2013) suggests, at least in theory, 
that such approaches may ultimately also prove powerful in link- 
ing transporters to signal transduction cascades and the processes 
which they control. 
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